Cost-Effective HITs for Relative Similarity Comparisons

نویسندگان

  • Michael J. Wilber
  • Iljung S. Kwak
  • Serge J. Belongie
چکیده

Similarity comparisons of the form “Is object a more similar to b than to c?” form a useful foundation in several computer vision and machine learning applications. Unfortunately, an embedding of n points is only uniquely specified by n triplets, making collecting every triplet an expensive task. In noticing this difficulty, other researchers investigated more intelligent triplet sampling techniques, but they do not study their effectiveness or their potential drawbacks. Although it is important to reduce the number of collected triplets to generate a good embedding, it is also important to understand how best to display a triplet collection task to the user to better respect the worker’s human constraints. In this work, we explore an alternative method for collecting triplets and analyze its financial cost, collection speed, and worker happiness as a function of the final embedding quality. We propose best practices for creating cost effective human intelligence tasks for collecting triplets. We show that rather than changing the sampling algorithm, simple changes to the crowdsourcing UI can drastically decrease the cost of collecting similarity comparisons. Finally, we provide a food similarity dataset as well as the labels collected from crowd workers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Routing using Query Feedback and Similarity in Unstructured Peer-to-peer Networks

In this paper, we propose a query based query routing approach for unstructured peer-to-peer network. We consider two parameters to be used to selectively route query in the network. The parameters are based on the recent past query and the similarity of the past query with the query to be routed. The objective of our approach is to have a low cost but effective routing approach in unstructured...

متن کامل

Transition Potential Modeling of Land-Cover based on Similarity Weighted Instance-based Learning Procedure and Its Implication in the REDD Project Design Document

  Reducing Emissions from Deforestation and Forest Degradation (REDD) is a climate change mitigation strategy employed to reduce the intensity of deforestation and GHGS emissions. In recent decades, drastic land use changes in Mazandaran province caused a substantial reduction in the amount of Hyrcanian forests. The present research based on objectives of REDD projects paid to identify of fore...

متن کامل

Stability and Similarity of Link Analysis Ranking Algorithms

Recently, there has been a surge of research activity in the area of link analysis ranking, where hyperlink structures are used to determine the relative authority of webpages. One of the seminal works in this area is that of Kleinberg [Kleinberg 98], who proposed the HITS algorithm. In this paper, we undertake a theoretical analysis of the properties of the HITS algorithm on a broad class of r...

متن کامل

Novel methods of measuring the similarity and distance between complex fuzzy sets

This thesis develops measures that enable comparisons of subjective information that is represented through fuzzy sets. Many applications rely on information that is subjective and imprecise due to varying contexts and so fuzzy sets were developed as a method of modelling uncertain data. However, making relative comparisons between data-driven fuzzy sets can be challenging. For example, when da...

متن کامل

An Improved HITS Algorithm Based on Page-query Similarity and Page Popularity

The HITS algorithm is a very popular and effective algorithm to rank web documents based on the link information among a set of web pages. However, it assigns every link with the same weight. This assumption results in topic drift. In this paper, we firstly define the generalized similarity between a query and a page, and the popularity of a web page. Then we propose a weighted HITS algorithm w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1404.3291  شماره 

صفحات  -

تاریخ انتشار 2014